| word | cosine similarity |
|---|---|
| awful | 1.0 |
| lame | 1.0 |
| alcoholic | 1.0 |
| sadly | 0.99 |
| relevant | 0.99 |
| are | 0.99 |
| reference word | closest words |
|---|---|
| awful | lame, alcoholic, sadly, relevant, are |
| mediocre | effort, turkey, terrible, stereotype, repeat |
| perfect | lovers, sing, manager, bath, donald |
| favorite | deeply, roud, marie, polanski, poetry |
| reference word | closest words |
|---|---|
| awful | ultimately, painful, sorry, fake, nowhere |
| mediocre | teeth, incompetent, main, disappointing, generous |
| perfect | great, seeking, freedom, tremendous, excellent |
| favorite | paulie, excellent, necessary, great, seeking |
The development and analysis of the word embedding model for classifying IMDB movie reviews demonstrated promising results.
The optimal number of embedding dimensions was identified as 7, achieving an accuracy of 87.34% on the test data set.
This was determined through extensive experimentation, revealing that higher dimensions, such as 7, provided competitive and consistent accuracy.
The model’s performance is noteworthy, given the constraints of training on only the top 5000 most common words, minimal data preprocessing, and limiting input sequences to the first 500 words of each review.
These factors illustrate the model’s robustness and effectiveness in capturing the semantic relationships within the data.
The embedding similarity results showed that the model could meaningfully capture semantic relationships, as evidenced by the coherent and relevant similar words found for terms like “awful,” “mediocre,” “perfect,” and “favorite.” The final training session, capped at 10 epochs, ensured the model did not overfit, maintaining its accuracy and reliability.
Overall, the model’s strong performance under constrained conditions highlights its potential for practical applications in sentiment analysis, offering an efficient and effective solution for understanding and categorizing movie reviews.